Unraveling the Impact of Randomization Techniques: A Case Study on Uber’s Tipping Experiment

python
university of michigan
randomization
statistical analysis
This post is continuation of my understanding about experimental design & analysis, here I am trying to discuss about randomization technique with help of uber tipping experiment use case for their drivers conducted couple of years back.
Author

kakamana

Published

March 25, 2024

::: {#2c46bd46 .cell _cell_guid=‘b1076dfc-b9ad-4769-8c92-a6c4dae69d19’ _uuid=‘8f2839f25d086af736a60e9eeb907d3b93b6e0e5’ execution=‘{“iopub.execute_input”:“2024-03-17T18:46:51.611450Z”,“iopub.status.busy”:“2024-03-17T18:46:51.611065Z”,“iopub.status.idle”:“2024-03-17T18:46:52.720502Z”,“shell.execute_reply”:“2024-03-17T18:46:52.719447Z”}’ papermill=‘{“duration”:1.120128,“end_time”:“2024-03-17T18:46:52.723250”,“exception”:false,“start_time”:“2024-03-17T18:46:51.603122”,“status”:“completed”}’ tags=‘[]’ execution_count=1}

Code
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

:::

This notebook is part of learning journey through my graduate applied data science program at University Of Michigan, along with courses of Datacamp, Coursera & LinkedIn etc.

This notobook is continuation of my understanding about experimental design & analysis, here I am trying to discuss about randomization technique with help of uber tipping experiment use case for their drivers conducted couple of years back.

You can refer to my story at medium and my article at linkedin for more details

This article includes:

Introduction

Randomization stands as a cornerstone in the edifice of experimental design, offering a robust defense against biases and confounding variables. This blog post embarks on an exploration of three pivotal randomization techniques — access, timing, and encouragement randomization. Each method plays a vital role in fortifying the integrity of experimental outcomes, ensuring that the causal inferences drawn are devoid of bias. To bring these concepts to life, we delve into the Uber tipping experiment, elucidating how each randomization technique can be applied. We further enhance our exploration with Python illustrations, showcasing their practical implementation and offering insights through statistical models.

Access Randomization

At the heart of access randomization lies the equitable distribution of subjects into treatment and control groups. This approach ensures that the intervention under scrutiny is exclusively available to the treatment group, providing a clear demarcation for comparative analysis.

Uber Tipping Experiment Application:

Imagine a scenario where Uber wishes to assess the impact of a new tipping feature on driver satisfaction. Utilizing access randomization, drivers are bifurcated into two cohorts; one gains access to the tipping functionality (treatment), while the other continues without it (control).

Code
import random

def access_randomization(drivers):
    treatment_group = random.sample(drivers, len(drivers) // 2)
    control_group = [driver for driver in drivers if driver not in treatment_group]
    return treatment_group, control_group
Code
import numpy as np

# Simulating driver IDs
drivers = np.arange(1, 101)

# Randomly assigning drivers to treatment and control groups
np.random.shuffle(drivers)
treatment_group = drivers[:50]  # First 50 drivers
control_group = drivers[50:]  # Remaining drivers

Timing Randomization:

Timing randomization introduces variability in the temporal aspect of treatment delivery. This method is particularly beneficial when all subjects are destined to receive the treatment, but the sequence of administration is randomized.

Uber Tipping Experiment Application:

In applying this to our Uber case, let’s consider a phased rollout of the tipping feature. Drivers are randomly assigned to different phases, ensuring an unbiased evaluation of the feature’s impact over time.

Code
def timing_randomization(drivers, num_waves):
    random.shuffle(drivers)
    wave_size = len(drivers) // num_waves
    waves = [drivers[i:i+wave_size] for i in range(0, len(drivers), wave_size)]
    return waves
Code
# Defining phases for feature rollout
phases = ['Phase 1', 'Phase 2', 'Phase 3']

# Assigning drivers to phases
driver_phases = np.random.choice(phases, size=len(drivers))

# Analyzing the distribution of drivers across phases
for phase in phases:
    print(f"{phase} Drivers:", np.sum(driver_phases == phase))
Phase 1 Drivers: 31
Phase 2 Drivers: 38
Phase 3 Drivers: 31

Encouragement Randomization:

Encouragement randomization is employed when the treatment is universally accessible, but a nudge is given to a randomly selected subgroup to encourage participation. This technique is instrumental in discerning the effect of encouragement on treatment uptake.

Uber Tipping Experiment Application:

In this context, while the tipping feature is available to all drivers, a randomized subset receives motivational messages or incentives to encourage the use of this feature, aiding in the assessment of encouragement’s effectiveness.

Code
def encouragement_randomization(drivers, encouragement_ratio):
    num_encouraged = int(len(drivers) * encouragement_ratio)
    encouraged_group = random.sample(drivers, num_encouraged)
    not_encouraged_group = [driver for driver in drivers if driver not in encouraged_group]
    return encouraged_group, not_encouraged_group
Code
# Defining encouragement proportion
encouragement_ratio = 0.3

# Randomly selecting drivers for encouragement
encouraged_drivers = np.random.choice(drivers, size=int(len(drivers) * encouragement_ratio), replace=False)

# Examining the encouraged group
print("Encouraged Drivers:", encouraged_drivers)
Encouraged Drivers: [66 41 47 75  5 34 54 17 20 59 73 24 95 29 16 65 44 23 45 11 70 25 76 74
 80 63 85 50 38 72]

Statistical Analysis with Regression:

To delve deeper into the data’s story, a regression analysis can provide quantifiable insights into the factors influencing tipping behavior.

Regression Model Summary:

Code
import statsmodels.api as sm

# Assuming 'df' is a DataFrame with 'tip_amount', 'treated', 'encouraged', and other relevant variables
X = sm.add_constant(df[['treated', 'encouraged']])
y = df['tip_amount']

# Fitting an OLS regression model
model = sm.OLS(y, X).fit()

# Displaying the model summary
print(model.summary())
NameError: name 'df' is not defined

This model allows us to parse the influence of being in the treatment group and receiving encouragement on the tipping amounts, controlling for other variables as necessary.

Quick References

  • “Randomized Controlled Trials: Design and Implementation for Community-Based Psychosocial Interventions” by Phyllis Solomon, Mary M. Cavanaugh, and Jeffrey Draine. This book offers insights into the application of randomized controlled trials in community settings. [1](https://www.amazon.com/Randomized-Controlled-Trials-Implementation-Community-Based/dp/0195333195)
  • “Design and Analysis of Experiments” by Douglas C. Montgomery. This classic text provides a comprehensive overview of experimental design principles, including randomization techniques. [2](https://www.wiley.com/en-ae/Design+and+Analysis+of+Experiments%2C+10th+Edition-p-9781119492443)

Timing Randomization:

  • “The Handbook of Experimental Economics, Volume 2” by John H. Kagel and Alvin E. Roth, eds. This collection includes chapters that discuss the importance of timing in experimental economics. [3](https://www.jstor.org/stable/j.ctvc77b40)
  • “Field Experiments and Their Critics: Essays on the Uses and Abuses of Experimentation in the Social Sciences” by Dawn Langan Teele. This work debates the use of field experiments and discusses timing randomization among other topics. [4](https://yalebooks.yale.edu/book/9780300169409/field-experiments-and-their-critics/)

Encouragement Randomization:

  • “Mostly Harmless Econometrics: An Empiricist’s Companion” by Joshua D. Angrist and Jörn-Steffen Pischke. This book simplifies the application of econometric techniques, including instrumental variables often used in encouragement randomization. [5](https://www.jstor.org/stable/j.ctvcm4j72)
  • “Instrumental Variables: An Econometrician’s Perspective” by Guido W. Imbens. A detailed discussion on instrumental variables, closely related to encouragement randomization, providing a solid statistical foundation. [6](https://www.jstor.org/stable/43288511)

General Experimental Design and Analysis:

  • “Experimental and Quasi-Experimental Designs for Generalized Causal Inference” by William R. Shadish, Thomas D. Cook, and Donald T. Campbell. A key text for understanding the principles behind experimental design and causal inference. [7](https://www.amazon.com/Experimental-Quasi-Experimental-Designs-Generalized-Inference/dp/0395615569)
  • “Python for Data Analysis” by Wes McKinney. Ideal for readers looking to implement experimental design techniques in Python, written by the creator of the pandas library. [8](https://wesmckinney.com/book/)